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Neuronal systems that are involved in reinforcement 
learning must solve the temporal credit assignment pro- 
blem, i.e., how is a stimulus associated with a reward 
that is delayed in time? Theoretical studies [1-3] have 
postulated that neural activity underlying learning 'tags' 
synapses with an 'eligibility trace', and that the subse- 
quent arrival of a reward converts the eligibility traces 
into actual modification of synaptic efficacies. While 
eligibility traces provide one simple solution to the tem- 
poral credit assignment problem, they alone do not con- 
stitute a stable learning rule because there is no other 
mechanism indicating when learning should cease. In 
order to attain stability, rules involving eligibility traces 
often assume that once the association is learned, 
further learning is prevented via an inhibition of the 
reward stimulus [1,3,4]. 

Although synaptic plasticity is responsible for reinforce- 
ment learning in the brain, theories of reinforcement 
learning are generally abstract and involve neither neurons 
nor synapses. Furthermore, biophysical theories of synap- 
tic plasticity typically model unsupervised learning and 
ignore the contribution of reinforcement. Here we 
describe a biophysically based theory of reinforcement- 
modulated synaptic plasticity and postulate the existence 
of two eligibility traces with different temporal profiles: 
one corresponding to the induction of LTP, and the other 
to the induction of LTD. The traces have different kinetics 
and their difference in magnitude at the time of reward 
determines if synaptic modification will correspond to 
LTP or LTD. Due to the difference in their decay rates, 
the LTP and LTD traces can exhibit temporal competition 
at the reward time and thus provides a mechanism for 
stable reinforcement learning without the need to inhibit 


reward. We test this novel reinforcement-learning rule on 
an experimentally motivated model of a recurrent cortical 
network [5], and compare the model results to experimen- 
tal results at both the cellular and circuit levels. We 
further suggest that these eligibility traces are implemen- 
ted via kinases and phosphatases, thus accounting for 
results at both the cellular and system levels. 
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